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vided for a distributed and recoverable digital control system. 
The method uses unique redundancy management techniques 
to achieve recovery and restoration of redundant elements to 
full operation in an asynchronous environment. The system 
includes a first computing unit comprising a pair of redundant 
computational lanes for generating redundant control com- 
mands. One or more internal monitors detect data errors in the 
control commands, and provide a recovery trigger to the first 
computing unit. A second redundant computing unit provides 
the same features as the first computing unit. A first actuator 
control unit is configured to provide blending and monitoring 
of the control commands from the first and second computing 
units, and to provide a recovery trigger to each of the first and 
second computing units. A second actuator control unit pro- 
vides the same features as the first actuator control unit. 
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METHOD AND SYSTEM FOR REDUNDANCY 
MANAGEMENT OF DISTRIBUTED AND 
RECOVERABLE DIGITAL CONTROL 
SYSTEM 

5 

This application claims the benefit of priority to U.S. Pro- 
visional Application No. 60/705,843, filed on Aug. 5, 2005, 
which is incorporated herein by reference. The present appli- 
cation is related to U.S. patent application Ser. No. 11/381, 
608, filed May 4, 2006, and to U.S. patent application Ser. No. to 
11/381,637, filed May 4, 2006, both of which are incorpo- 
rated herein by reference. 

The U.S. Government may have certain rights in the 
present invention as provided for by the terms of Contract No. 
CRA NCC-1-393 with NASA. 15 

BACKGROUND TECHNOLOGY 

Computers have been used in digital control systems in a 
variety of applications, such as in industrial, aerospace, medi- 20 
cal, scientific research, and other fields. In such control sys- 
tems, it is important to maintain the integrity of the data 
produced by a computer. In conventional control systems, a 
computing unit for a plant is typically designed such that the 
resulting closed loop system exhibits stability, low-frequency 25 
command tracking, low-frequency disturbance rejection, and 
high-frequency noise attenuation. The “plant” can be any 
object, process, or other parameter capable of being con- 
trolled, such as an aircraft, spacecraft, medical equipment, 
electrical power generation, industrial automation, valve, 30 
boiler, actuator, or other device. A control effector is used to 
provoke a response by the plant. For example, when the plant 
is an aircraft, control effectors may be in the form of flight 
control surfaces such as rudders, ailerons, and/or elevators. 

Various types of failures or faults may be encountered by 35 
conventional computing units found in control systems. A 
“hard fault” is a fault condition typically caused by a perma- 
nent failure of the analog or digital circuitry. For digital cir- 
cuitry, a “soft fault” is typically caused by transient phenom- 
ena that may affect some digital circuit computing elements 40 
resulting in computation disruption, but does not permanently 
damage or alter the subsequent operation of the circuitry. 

Soft faults may be caused by electromagnetic fields created 
by high-frequency signals propagating through the comput- 
ing system. Soft faults may also result from spurious intense 45 
electromagnetic signals, such as those caused by lightning 
that induce electrical transients on system lines and data buses 
which propagate to internal digital circuitry setting latches 
into erroneous states. In addition to lightning, other elements 
of the electromagnetic environment (EME) such as high- 50 
intensity radiated fields (HIRF), radio communications, radar 
pulses, and the intense fields associated with electromagnetic 
pulses (EMP) may also cause soft faults. Further, high-energy 
atomic particles from a variety of sources (e.g., atmospheric 
neutrons, cosmic radiation, weapon detonation, etc.) may 55 
deposit sufficient energy in the bulk semiconductor material 
of a digital device to set electronic circuits into erroneous 
states. With the advent of smaller integrated circuits running 
at high speeds, soft faults are becoming more common such 
as in the radiation environment encountered by aircraft trav- 60 
eling at high altitudes. In such an environment, computing 
circuits containing state-of-the-art digital devices may be 
more susceptible to failure. 

In conventional control systems, various forms of redun- 
dancy have been used in an attempt to reduce the effects of 65 
faults in critical systems. Multiple processing units, for 
example, may be used within a computing system. In a system 
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with three processing units, for example, if one processor is 
determined to be experiencing a fault, that processor may be 
isolated and/or shut down. The fault may be corrected by 
correct data (such as the current values of various control state 
variables) being transmitted (or “transfused”) from the 
remaining processors to the isolated unit. If the faults in the 
isolated unit are corrected, the processing unit may be re- 
introduced to the computing system along with the other two 
processing units. 

Dissimilar computational redundancy is used to prevent 
the introduction of generic faults in control system architec- 
tures. Generic faults refer to common errors in system redun- 
dancies. Such errors can occur in the design and development 
of the hardware and software elements within general pur- 
pose computers that are used in control system architectures. 
As such, dissimilar computational redundancy would entail 
each redundant hardware element using a dissimilar micro- 
processor and each redundant microprocessor executing soft- 
ware (e.g., operating system, application, etc.) that was devel- 
oped using a different programming language. 

Other methods that have been used to help ensure the 
continued operation of control systems include the use of 
dissimilar technology, distributed computation redundancy, 
equalization, and mid-value voting. Each of these methods, 
however, generally requires at least one processing unit to 
remain operational at all times to preserve state variables. 
While the control systems may remain operational if all but 
one of the processing units experience a soft fault and the 
correctly -operating unit can be identified, the control system 
will not operate properly if all of the processors simulta- 
neously experience soft faults. Similarly, if a lone properly- 
operating unit cannot be identified within the system, the 
system will not recover, as there would be no identifiable 
operating unit with correct values for all of the state variables 
to be transfused to the remaining units . In addition, because of 
the transfusion of state variables from other processing units, 
the system recovery may be relatively slow. It may therefore 
take an extended period of time for all processing units within 
the system to resume normal operation. In the meantime, 
redundant control is undesirably lost or degraded. 

In the aerospace field, digital flight control systems are 
frequently interposed between the pilot and the flight control 
surfaces of an aircraft. Such systems may include fly-by -wire, 
auto-pilot, and auto-land systems. In a fly -by- wire system, in 
lieu of pilot controls being mechanically coupled (e.g., via 
cables or hydraulics) to the various primary flight control 
surfaces of the aircraft (such as the ailerons, elevators, and 
rudder), the position and movements of a pilot’s controls are 
electronically read by sensors and transmitted to a computing 
system. The computing system typically sends electronic 
control signals to actuators of various types that are coupled 
to the primary flight control surfaces of the aircraft. The 
actuators are typically configured to move one or more con- 
trol surfaces according to inputs provided by the pilot, or in 
response to feedback measured by a sensor on the aircraft. 
Failure of the control system could thus have catastrophic 
effects on the aircraft. Similarly, industrial, medical, or other 
systems may be gravely affected by certain control system 
failures. 

In conventional flight control system (FCS) architectures, 
recovery from soft faults of FCS architectural elements, par- 
ticularly in the flight control computer, is either not possible, 
has to resort to recovery attempts after a grace period of time, 
or requires recycling of power such as rebooting the com- 
puter. Any of these circumstances can impact the mean time 
between unscheduled removals (MTBUR) negatively. In 
addition, tight tolerance monitoring has been dependant on 
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synchronous operations for tight tracking of redundant ele- 
ments, and has been relatively federated and not easily scale- 
able. 

High integrity digital flight control systems usually require 
incorporation of redundant elements to achieve required reli- 5 
ability. Management of systems to provide maximum theo- 
retically possible availability of the redundant elements in the 
presence of soft faults is difficult to achieve without requiring 
close synchronization of the computing elements or other 
technically difficult monitoring mechanisms. 10 

BRIEF DESCRIPTION OF THE DRAWINGS 

Features of the present invention will become apparent to 
those skilled in the art from the following description with 15 
reference to the drawings. Understanding that the drawings 
depict only typical embodiments of the invention and are not 
therefore to be considered limiting in scope, the invention will 
be described with additional specificity and detail through the 
use of the accompanying drawings, in which: 20 

FIG. 1 is a schematic depiction of a digital control system 
that can employ the redundancy management features of the 
invention; 

FIG. 2 is a block diagram of a soft fault rapid recovery 
system that can be used in the digital control system of FIG. 25 

l; 

FIG. 3 is a block diagram of a command and recovery 
management system that can be used in the digital control 
system of FIG. 1 ; and 

FIG. 4 is a block diagram of a time magnitude monitoring 30 
method that can be used as part of the redundancy manage- 
ment method of the invention. 

DETAILED DESCRIPTION 

35 

The present invention relates to a method and system for 
redundancy management of a distributed and recoverable 
digital control system. This invention uses unique redun- 
dancy management techniques to achieve recovery and res- 
toration of redundant elements to full operation in an asyn- 40 
chronous environment. Utilizing recoverable computing 
elements and redundancy management that accommodates 
recovery of redundant elements ensures maximum availabil- 
ity of system resources in the presence of soft faults. The 
redundancy management functions are distributed through- 45 
out the architecture of the control system. 

In the following description, various embodiments of the 
present invention may be described herein in terms of various 
architecture elements and processing steps. It should be 
appreciated that such elements may be realized by any num- 50 
ber of hardware or structural components configured to per- 
form specified operations. For purposes of illustration only, 
exemplary embodiments of the present invention will fre- 
quently be described herein in connection with aircraft avi- 
onics. The invention is not so limited, however, and the con- 55 
cepts and devices disclosed herein may be used in any control 
environment. Further, it should be noted that although various 
components may be coupled or connected to other compo- 
nents within exemplary system architectures, such connec- 
tions and couplings can be realized by direct connection 60 
between components, or by connection though other compo- 
nents and devices located therebetween. The following 
detailed description is, therefore, not to be taken in a limiting 
sense. 

According to various exemplary embodiments of the 65 
invention, a control system architecture suitably includes suf- 
ficient computation redundancy and control command man- 
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agement to either isolate and recover a faulted processor, or to 
recover all processing units of the redundant system without 
adverse effects. Computational redundancy may be provided 
with multiple processors or processing units within a com- 
puter or computing platform. In addition to isolating and 
recovering from internal faults, various embodiments allow 
processing units to detect faults in other system elements such 
as sensors, adaptors, actuators and/or effectors. Further 
embodiments may also include one or more actuator adaptor 
units, that through the detection of adverse data errors, detect 
faults in other system components (that are in addition to the 
processing units) and issue discrete instructions to trigger a 
recovery. 

An exemplary control system architecture suitably 
includes multiple processors, each of which is configured for 
rapid recovery from various faults. The term “rapid recovery” 
indicates that recovery may occur in a very short amount of 
time. To maintain the operation of a control system, it is 
generally desirable that a recovery from a soft fault takes 
place within about 1 to 2 computing frames. As used herein, 
a “computing frame” is the time needed for a particular pro- 
cessing unit to perform a repetitive task of a computation, 
e.g., the tasks that need to be calculated continuously to 
maintain the operation of the controlled plant. In some 
embodiments, processor recovery is performed within about 
1 computing frame and redundancy recovery is performed 
within about 1 or 2 computing frames, or otherwise in a short 
enough time period so as to have only minimal effects, if any, 
on system performance. 

The ability of a processor to initiate recovery from a soft 
fault allows various embodiments of the present invention to 
aid in the recovery of the system as a whole. In addition, soft 
faults may be detected in the same computing frame or within 
several frames in which the faults occur. In embodiments 
wherein faults are detected within a single computing frame, 
each processor need only store control and logic state variable 
data for the immediately preceding frame for use in recovery 
purposes, which may take place essentially instantaneously. 
Accordingly, the dependence of each component upon other 
redundant components is suitably reduced. 

Instructions for carrying out the various methods, process 
tasks, calculations, control functions, and the generation of 
signals and other data used in the operation of the system of 
the invention are implemented, in some embodiments, in 
software programs, firmware, or computer readable instruc- 
tions. These instructions are typically stored on any appropri- 
ate computer readable medium used for storage of computer 
readable instructions or data structures. Such computer read- 
able media can be any available media that can be accessed by 
a general purpose or special purpose computer or processor, 
or any programmable logic device. 

By way of example, and not limitation, such computer 
readable media can include floppy disks, hard disks, ROM, 
flash memory ROM, nonvolatile ROM, EEPROM, RAM, 
CD-ROM, DVD-ROM, or other optical disk storage, mag- 
netic disk storage, or other magnetic storage devices, or any 
other medium that can be used to carry or store desired pro- 
gram code means in the form of computer executable instruc- 
tions or data structures. When information is transferred or 
provided over a network or another communications connec- 
tion (either hardwired, wireless, or a combination of hard- 
wired or wireless) to a computer, the computer properly views 
the connection as a computer readable medium. Thus, any 
such connection is properly termed a computer readable 
medium. Combinations of the above are also included within 
the scope of computer readable media. Computer executable 
instructions comprise, for example, instructions and data 
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which cause a general purpose computer, special purpose 
computer, or special purpose processing device to perform a 
certain function or group of functions. 

The system of the invention will also be described in the 
general context of computer readable instructions, such as 
program modules, being executed by a processor. Generally, 
program modules include routines, programs, objects, data 
components, data structures, algorithms, etc. that perform 
particular tasks or implement particular abstract data types. 
Computer executable instructions, associated data structures, 
and program modules represent examples of a program code 
means for executing steps of the methods disclosed herein. 
The particular sequence of such executable instructions or 
associated data structures represents examples of correspond- 
ing acts for implementing the functions described in such 
steps. 

In one embodiment, the present invention provides a 
redundancy management method for the architectural ele- 
ments of a control system such as a primary flight control 
system where some elements can rapidly recover from soft 
faults. This method manages redundant commands, 
responses, and recoveries; uses the status from independent 
hardware monitors internal and external to a computing unit 
such as a flight control computer (FCC); and uses command 
blending and equalization provided by complementary archi- 
tecture elements comprised of high-integrity self-checking 
pairs of computing lanes within each computing unit and an 
actuator control unit (ACU). The ACU may by implemented 
in hardware or software or a combination of both. The ACU 
performs fault isolation by comparing the commands to the 
blended value from all inputs. The equalization makes it 
possible to hold the comparison to a tight threshold in the 
monitor. The ACU isolates the faulted output and can com- 
mand recovery of a computing lane from a soft fault. Recov- 
ery can also be commanded by a computing unit of the control 
system based on a failure of a computing unit internal moni- 
tor. 

Referring now to FIG. 1, an exemplary scaleable architec- 
ture of a digital control system 100 that can employ the 
redundancy management functions of the invention includes 
a first computing unit 112 and a second computing unit 114. 
The computing units 112 and 114 can be any digital control 
device such as a digital computer or processor, and provide 
for redundancy in processing. Each computing unit 112, 114 
suitably includes one or more processing devices capable of 
executing multiple and/or simultaneous software processes. 
As shown, the computing units 112 and 114 can include 
real-time multi-tasking computing platforms such as a pri- 
mary flight control computer (PFCC). The PFCC can be an 
integrated modular computing platform (IMCP) with dual 
computing lanes. 

The computing units 112 and 114 provide input process- 
ing, sensor selection, control laws (e.g., pitch, yaw, and roll 
inner loops), monitoring (e.g., actuator and effector monitor- 
ing), equalization, rapid recovery, redundancy management, 
and any appropriate recovery triggers. Although control sys- 
tem 100 is shown with two computing units, additional com- 
puting units can be employed if desired. 

Each of the computing units 112 and 114 are in operative 
communication with a multitude of actuator control units 
(ACUs) 116, 118, 120, and 122, which provide for actuator 
command (Cmd) management and have dual computing 
lanes. The ACUs perform command blending and selection, 
and use other redundant actuator command values while a 
computing platform such as a PFCC lane is recovering. The 
ACUs also perform monitoring of actuator command lanes, 
data concentration, and initiation of a selective and isolated 
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recovery trigger of each monitored application. The ACUs 
can also be redundant per control axis. Although control 
system 100 is shown with four ACUs, a varying number of 
ACUs can be employed depending upon system require- 
5 ments. For example, in some embodiments three or more 
ACUs can be employed in a control system according to the 
invention. 

Each ACU 116, 118, 120, and 122 is also in operative 
communication with a respective one of a smart actuator 124, 
10 126, 128, and 130. An actuator is made “smart” when an 
electronics module such as an electronic interface unit (EIU) 
is added to the basic actuator. The smart actuators used in the 
control system can be dual-channel, fail-passive, electrome- 
chanical actuators, which contain two independent computa- 
15 tional lanes. The smart actuators receive actuator position 
command signals from the ACUs. The smart actuators also 
determine validity of commands from the computing unit 
based on command validity flags and activity monitoring. The 
smart actuators 124, 126, 128, and 130 are configured to 
20 provide feedback to the respective ACU 116, 118, 120, and 
122 related to actuator position information. 

The smart actuators 124, 126, 128, and 130 can optionally 
be in operative communication with a respective servo or 
actuator device such as hydraulic actuators 132, 134, 136, and 
25 138. The hydraulic actuators 132, 134, 136, and 138 can be 
respectively coupled to various control effectors 140, 141, 
142, and 143 such as, for example, various primary flight 
control surfaces of an aircraft (e.g., rudders, ailerons, and/or 
elevators). The control effectors 140-143 are configured to 
30 provide feedback to the respective ACU 116, 118, 120, and 
122 related to effector position information. 

As depicted in FIG. 1, the computing units 112 and 114 
receive data inputs from sensor sets 150, 152, and 154, which 
can include air data, inertial data, or commands from an 
35 operator (e.g., pilot controls, etc.). The sensor sets can include 
any number of gyroscopes, vehicle position sensors, airflow 
sensors, temperature sensors, and/or other sensing devices as 
may be appropriate for the particular implementation. A data 
concentrator 156, 158, and 160 with a single lane can be 
40 implemented between each sensor set 150, 152, 154 and 
computing units 112 and 114. The data concentrators suitably 
receive and concentrate data from the sensors to provide an 
interface to computing units 112 and 114 as appropriate. The 
data concentrators may also provide sensor validity monitor- 
45 ing to ensure that the sensors remain active. Each of the 
sensors may optionally include rapid recovery elements if 
available and desired for the particular implementation. 

The control system 100 can be suitably implemented, for 
example, as part of a digital flight control system to provide 
50 lunctions for the safe flight and landing of aero space vehicles . 
The control system 100 provides for independent recovery of 
any computing lane, and all system elements can be executed 
asynchronously. Also, control system 100 can accommodate 
the asynchronous operation of dissimilar computational 
55 redundancy. For example, the PF CC performs equalization of 
surface positions by bringing diverging data back to the same 
value or close to the same value. An actuator command man- 
agement voting algorithm accommodates asynchronous sur- 
face command inputs such that the PFCC, ACU, and other 
60 elements can execute asynchronously, and can accommodate 
computational lanes using dissimilar computational redun- 
dancy. 

The computing platform such as the PFCC provides a 
real-time multi-tasking computer system with rollback recov- 
65 ery capability. The PFCC enables integration of functions, 
and applications may selectively use the recovery fimction as 
required. The recovery mechanism operation can be verifi- 
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able using common built-in-test methods, which can be used 
to verify operation of the recovery mechanism at any time. 
The PFCC can also provide monitoring of ACU surface com- 
mands and surface positions. 

During operation of control system 100, computing units 
112, 114 receive input from sensor sets 150, 152, 154 via data 
concentrators 156, 158, 160. Each computing unit provides 
the appropriate data to each computational lane thereof, 
which operate as separate partitioned processing units. 
Accordingly, each data set from redundant sensor and com- 
mand data sets can be simultaneously processed in multiple 
isolated processing units. The commands signals from each 
lane of computing units 112, 114 propagate to each of the 
ACUs 116, 118, 120, and 122. The ACUs transmit the com- 
mand signals to the smart actuators 124, 126, 128, and 130, 
which then perform the requested commands as appropriate 
to control the hydraulic actuators 132, 134, 136, 138, and 
thereby the control effectors 140-143. During normal opera- 
tion, the output signals from each processing unit can be 
monitored internally, or externally by the ACUs, to ensure 
that each of the computing units 112, 114 are producing 
results within a predetermined tolerance of the remaining 
computing units. 

Each processing unit of computing units 112, 114 is con- 
figured to be capable of rapid recovery from soft faults. To 
accomplish rapid recovery, each processing unit is configured 
to retrieve control and logic state variable data from internal 
memory locations such as a high integrity random access 
memory. Using the retrieved state variables and appropriate 
sensor data, each processing unit can fully recover from a soft 
fault relatively quickly without requiring a data transfusion 
from another computing unit. The rapid recovery cycle 
involves halting processor execution, instating state variables 
from a protected memory area, and starting execution again at 
an appropriate location in the program execution cycle. 
Through the use of command blending, equalization, or other 
techniques, it is not necessary to synchronize with the remain- 
ing processing units after initiating the rapid recovery cycle. 

For example, FIG. 2 is a block diagram of a soft fault rapid 
recovery system 200 that can be used in the digital control 
system of the invention. The recovery system 200 is imple- 
mented internally in each computing platform such as com- 
puting units 112 and 114 of control system 100 in FIG. 1. As 
shown in FIG. 2, a monitor 21 0 is provided that is in operative 
communication with a central processing unit (CPU) 220 and 
a CPU 222. A memory unit 224 operatively communicates 
with CPU 220, and another memory unit 226 operatively 
communicates with CPU 222. Protected storage areas can be 
provided to store state variable data 250, 252. The protected 
storage areas can include high integrity memory cells such as 
disclosed in U.S. Pat. No. 6,163,480, which is incorporated 
herein by reference. 

During operation of recovery system 200, state variable 
data 250, 252 generated by CPU 220 and CPU 222 is stored in 
memory units 224 and 226, respectively. The state variable 
data is related to the state the CPU operates under for a given 
set of inputs and outputs. This data includes states generated 
by the computing hardware as well as states generated by the 
application software. The data is stored with respect to a given 
computing frame N several frames back in time (N-l, N-2, . 
. . N-X) in the protected storage areas. If the CPU or a 
memory element is upset to generate a soft fault, such as by 
interference from an EME signal, monitor 210 rapidly detects 
the soft fault and causes a recovery trigger 260 to initiate. This 
restores the state variable data saved from one of the previous 
computing frames and restarts the upset CPU with the most 
recent good data at the next starting computing frame. The 
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recovery system 200 allows the recovered CPU to start in a 
time frame near to the time frame of the other CPUs in the 
system since no reboot of the system is necessary. 

A suitable related fault recovery system that can be used in 
5 the control system of the present invention is disclosed in 
copending U.S. patent application Ser. No. 1 1/058,764 filed 
on Feb. 16, 2005, which is incorporated herein by reference. 

During a rapid recovery cycle, the tolerance used in an 
ACU to determine if a processing unit is operating properly 
to may be relaxed for that particular processing unit. For 
example, during normal operation, there may be a predeter- 
mined tolerance, within which each of the processing units is 
expected to operate. If a processing unit produces values that 
are not within the predetermined tolerance, that processing 
15 unit may be determined to be suffering from a soft fault, and 
a rapid recovery cycle may be initiated. During the rapid 
recovery cycle, the predetermined tolerance for the affected 
processing unit may be initially widened and then narrowed 
over a predetermined time period such that further deviations 
20 are acceptable until the processing unit resumes normal 
operation. 

Furthermore, the output of the processing unit may not be 
included in the derivation of the output from the ACU (e.g., 
computation of the mid-value) until the output comes within 
25 the relaxed tolerance. If the output comes within tolerance 
(indicating that the computing unit has stabilized) within a 
predetermined period of time, it may once again be included 
in the output derivation. Before the predetermined time has 
expired and the processing unit output has come within tol- 
30 erance, requests for placing the processing unit into a rapid 
recovery state may be suspended or “masked” to allow the 
processing unit to recover. Once the processing unit has sta- 
bilized from the recovery cycle, it may be subjected to the 
previous tolerances. If the output does not come within tol- 
35 erance within the predetermined time, another request to 
place the processing unit into a rapid recovery state may be 
issued. 

In general, if the output of a recovered element falls outside 
of the relaxed tolerance following recovery, that computa- 
40 tional element is kept off-line until the system is restarted 
(i.e., re-powered). Such a failure indicates that the recovery 
was unsuccessful. While rare, this provides a means for 
excluding a computational element that does not return to the 
tolerance within a specified time period. The tolerance used 
45 following recovery is tightened over a specific time period 
until it reaches the original tolerance. 

One technique for computing the blended control output 
involves computing a “mid-value” in which the signals from 
a processing unit are used to compute a mean and/or median 
50 of all of the values produced. This mid- valve is then compared 
to each signal from each of the processing units in the system. 
If a discrepancy exists between any particular value produced 
by any lane and the mean and/or median of all the values (i.e. , 
the mid-values), an error condition is detected and the appro - 
55 priate processing unit is commanded to initiate a rapid recov- 
ery cycle. The discrepancy from the mid-values may be based 
upon any tolerance value, which can in turn be adjusted based 
upon desired conditions. The detection of discrepancy from a 
mean or median value can be processed very rapidly, thus 
60 potentially resulting in an identification of an error within one 
or two computational frames of the value being produced. 
Accordingly, differences from mid-values may be computed 
based upon previous mid-values (i.e., values maintained from 
a previous frame), or can be computed in real time as appro - 
65 priate. 

Alternatively, when one or more of the ACUs 116, 118, 
120, 122 sense that one of the computing units 112, 114 are 
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not supplying signals that lie within certain tolerances, the 
ACUs may transmit a signal to the computing unit in question 
to request the start of a rapid recovery cycle for a particular 
processing unit. 

The computing units of the control system are configured 
to perform redundancy management actions such as equal- 
ization, in which the control signal generated by each pro- 
cessing unit is driven toward fine adjustments in the computed 
mid- value, so that the signals produced by each processing 
unit result in an equalized control command. Such implemen- 
tations typically do not require tight synchronization between 
the various processing units to achieve “equalized” command 
values because each command signal is driven toward the 
other signals (i.e., toward a mid-value). 

An equalization signal is derived from feedback of control 
effector positions and is used to cancel out drift in the surface 
commands, preventing divergence of computed surface com- 
mands. This allows tight monitoring and comparison of the 
command signals in the ACU and the rapid detection of 
computing element errors in time to command recovery 
before state variable values are permanently lost. 

Use of the equalization method allows asynchronism of the 
control computation elements so that the implementation of 
the elements may be either similar or dissimilar as called for 
in order to meet reliability and availability requirements. For 
example, different types of processors may be employed in 
the computing units if desired. The equalization scheme also 
accommodates errant effector positions 

By implementing command blending and equalization, 
rapid recovery of redundant elements is provided for in a 
transparent, seamless way. That is, any data errors in any 
processing unit do not propagate through the system to 
adversely affect the control function. If one processing unit 
encounters a soft fault and proceeds to a recovery cycle, the 
remaining operating processing units are unaffected by the 
recovery cycle and the operation of the control system as a 
whole is unaffected. 

FIG. 3 is a block diagram of a command and recovery 
management system 300 that illustrates a portion of the 
redundancy management system of the invention used with a 
digital control system. As shown, management system 300 is 
scalable to any number of redundant computing units with 
internal monitors that operatively communicate with a 
respective actuator control manager of an ACU that provides 
an external monitor for the computing units. This allows for 
ease in adding extra redundancy to the control system, adding 
extra control effectors, and adding extra control functions. 

As depicted in FIG. 3, a computing unit 310 such as a 
PFCC includes a processor or computer la with recovery 
available, and an internal monitor lb that provides a recovery 
trigger 1 c for computer la. The computing unit 310 is in 
operative communication with an ACU 320 having an actua- 
tor control manager Id that provides mid-value voting and 
monitoring of command lanes. An external recovery trigger 
le is provided such that ACU 320 can initiate the recovery of 
computing unit 310. A redundant computing unit 350 
includes a processor or computer (Na) and an internal moni- 
tor (Nb) that provides a recovery trigger (Nc) for computer 
(Na). The computing unit 350 is in operative communication 
with an ACU 360 having an actuator control manager (Nd) 
that provides mid- value voting and monitoring of command 
lanes. An external recovery trigger Ne is also provided. 

The computing unit 310 is also in operative communica- 
tion with ACU 360, which provides a redundant external 
recovery trigger (Ne) to computing unit 310. Likewise, com- 
puting unit 350 is in operative communication with ACU 320, 
which provides a redundant external recovery trigger le to 
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computing unit 350. Redundancy is provided by the external 
recovery triggers since each of the ACUs can initiate com- 
puter recovery, because the ACUs calculate a voted value 
independently. 

5 During operation of redundancy management method 
employed in management system 300, a command signal 1/is 
sent from computer la to ACU 320 and ACU 360. If either 
ACU detects an errant command (for instance an errant com- 
mand caused by a soft fault), external recovery triggers le or 
to (Ne) can be initiated for computer la recovery. Likewise, a 
command signal (Nf) from computer (Na) is sent to ACU 360 
and ACU 320, which can initiate external recovery triggers 
(Ne) and le, respectively, for soft fault recovery of computer 
(Na). The internal monitors lb and (Nb) can also initiate 
15 recovery of computers la and (Na) through internal recovery 
triggers 1 c and (Nc), respectively. If an ACU or the internal 
monitors sense a hard fault (i.e., an unrecoverable fault), then 
the appropriate computing unit may be shut down or isolated 
as appropriate. 

20 Sensor, actuator, control effector, and time magnitude 
monitoring support the redundancy management method of 
the invention. Sensor source selection is performed based on 
data freshness and reasonableness monitoring. In time mag- 
nitude monitoring, an ACU blended command output is com- 
25 pared against the computed effector command for each con- 
trol axis and deviations beyond an established limit for the 
specified time are reported for further redundancy manage- 
ment action. In this redundancy management method, the 
smart actuators are commanded to ignore or disregard the 
30 ACU commands when the deviations are beyond the estab- 
lished limit. 

The actuator position feedback is also checked against a 
selected ACU blended command. A failure of the actuator is 
reported if checks of the actuator position feedback versus the 
35 ACU blended command exceeds a prescribed limit for the 
specified time. In this redundancy management method, 
actuators are commanded to restart if checks of a control 
effector (e.g., aileron, elevator, etc.) position feedback against 
the ACU blended command exceed a prescribed limit for the 
40 specified time. In this redundancy management method, the 
actuators are commanded to shutdown if restarting fails to 
correct the position error. By employing these comparison 
functions it is possible to determine where a fault is located. 

FIG. 4 is a block diagram of an exemplary time magnitude 
45 monitoring method such as described above that can be used 
to support the redundancy management methodology of the 
invention. A processing unit of the control system performs a 
control law computation 410 and sends a computed surface 
command to a compare module 420 for comparison with an 
50 ACU(l) voted actuator command. In this redundancy man- 
agement method, a smart actuator is commanded to ignore the 
ACU(l) voted actuator command when deviations between 
the ACU(l) voted command output and the computed surface 
command are beyond an established limit for a specified time 
55 424. The ACU voted actuator commands for ACU(l), ACU 
(2) . . . ACU(n) are sent to a select module 430, and the 
selected command is sent to a compare module 440 for com- 
parison with an actuator surface position feedback signal. 
When deviations between the selected voted actuator com- 
60 mand output and the actuator surface po sition feedback signal 
exceed an established limit for a specified time 450, the actua- 
tor is commanded to shutdown in this redundancy manage- 
ment methodology. 

Various redundancy management actions can be per- 
65 formed by an ACU, such as validating redundant control 
effector commands from computing units such as FCCs, per- 
forming mid- value voting of redundant control effector com- 
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mands, and commanding recovery of any computing unit that 
exceeds an error threshold. The ACUs exclude from the vot- 
ing surface connnands a recovering computing unit during a 
specified recovery period (i.e., grace period), and perma- 
nently exclude surface commands from a computing unit that 
fails to recover during a specified time interval. An ACU 
detects attempts of other ACUs to control the same actuator, 
stops attempting to control the same actuator as another ACU, 
and attempts to establish control of an actuator that is not 
being driven by any other ACU. The ACU also reports actua- 
tor status to a computing unit to allow fault isolation actions 
by the computing unit. 

Redundancy management actions are also performed by 
the smart actuators, such as validating all incoming ACU 
command messages, and positioning an output drum only in 
response to identical valid connnands on both input channels. 
The smart actuators also respond to an ACU with a status 
message each time dual valid command messages are 
received, and disengage when invalid or no valid command 
message is received for the timeout period. 

The redundancy management method and system of the 
invention have the following characteristics that allow rapid 
recovery of redundant elements of the control system. The 
recovery of a computational element is selective depending 
upon which computation is in error. Successfully recovered 
computational elements are allowed to be gracefully restored 
to the original system configuration. In order to allow an 
element to recover and gracefully re-enter the system, moni- 
tors (both internal and external) adjust upon detection of a 
recovery to not lockout an element until it has had a chance to 
recover. Management of redundant elements both during and 
after recovery of a computational element from soft faults is 
provided. 

Hard failures are not masked by the recovery and redun- 
dancy management methods. Recovery counters and timeout 
monitors ensure that if recoveries are not successful, faulted 
elements are permanently locked out from further operation. 
Further, the redundancy management functions are distrib- 
uted throughout the control system so that multiple elements 
can be in recovery at any one time. 

Recovery trigger outputs are distributed among redundant 
elements in the control system. Multiple external recovery 
triggers to a computing platform element within one compu- 
tational frame time are managed to initiate only a single 
recovery action in that element. The redundancy management 
system is scalable in that there is no dependency in the sort- 
ing, voting, or monitoring algorithms that would exclude a 
greater number of computational elements from being man- 
aged. 

When the redundancy management functions are imple- 
mented as part of a digital control system used in the context 
of aerospace flight control, the control system performs the 
safety functions necessary for safe flight and landing of air- 
craft. The control system architecture maintains a pilot con- 
nection to control surfaces at all times such that the pilot has 
the last action. The recovery management provided by the 
control system does not compromise aircraft stability, and 
recovery management ensures that fault events and recoveries 
are transparent to aircraft function. 

The control system architecture also supports multiple 
recoveries of redundant flight control elements from multiple 
monitors in real time. This is provided by using multiple 
redundant ACUs per control axis and multiple redundant 
external recovery paths. The recovery management of the 
control system does not compromise aircraft stability, since 
recovery of each computing lane occurs before an aircraft 
effect is produced. Even if all computing lanes needed to be 
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recovered, such a recovery would occur before an aircraft 
effect would be produced. The additional layers of actuator 
command processing within the control system assures that 
no aircraft effect is produced during and after recovery. By 
5 employing recovery management of redundant elements dur- 
ing/after FCC recovery from soft faults, the actuator com- 
mand management can use other redundant actuator com- 
mand values while the FCC is recovering. 

In order not to mask hard faults when soft fault recovery is 
to part of a computing environment implementing a flight con- 
trol function, some form of monitoring for number of recov- 
eries should be provided. Along with keeping track of the 
number of recoveries, monitoring criteria should, at a mini- 
mum, limit the time in which some maximum number of 
15 recoveries are allowed for systems that are flight critical, 
particularly flight control systems. The multiple redundant 
elements in the control system provide hard fault manage- 
ment/containment. Using distributed redundancy manage- 
ment ensures that a recurring FCC fault is eventually treated 
20 as a hard fault since a hard fault or failure is not inadvertently 
masked by recovery. For example, a recovery retry counter 
can be used to ensure that a recurring FCC fault is eventually 
treated as a hard failure. The FCC will be taken off-line if 
excessive recovery retries occur in too short of a time. 

25 The present invention may be embodied in other specific 
forms without departing from its essential characteristics. 
The described embodiments and methods are to be consid- 
ered in all respects only as illustrative and not restrictive. The 
scope of the invention is therefore indicated by the appended 
30 claims rather than by the foregoing description. All changes 
that come within the meaning and range of equivalency of the 
claims are to be embraced within their scope. 

What is claimed is: 

1. A method for redundancy management comprising: 

35 providing a plurality of computing units each comprising: 
a plurality of redundant processing units for generating 
one or more redundant control commands; and 
one or more internal monitors for detecting one or more 
data errors in the control commands; 

40 providing a plurality of actuator control units having a pair 
of redundant computational lanes for analyzing control 
commands and providing feedback to the processing 
units; and 

initiating a selective and isolated recovery of one or more 
45 monitored applications in a processing unit of the plu- 
rality of processing units while one or more other appli- 
cations remain undisturbed in the processing unit when: 
one or more data errors are detected in the one or more 
monitored applications by one or more of the internal 
50 monitors; or 

one or more data errors are detected in the one or more 
monitored applications by one or more of the actuator 
control units; 

wherein the recovery restores the most recent error-free 
55 congruent set of state data for any one or more of the 

selected applications simultaneously. 

2. The method of claim 1, further comprising: 

computing a blended command for one or more of the 

control commands; and 

60 initiating recovery in one or more of the applications when 

the difference between a control command generated by 
a processing unit and the blended command exceeds a 
threshold value. 

3. The method of claim 2, further comprising: 

65 excluding from the blended command a processing unit 
command for a recovering application for a specified 
recovery period; 
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wherein the recovering processing unit command is not 
permanently excluded from the blended command 
before the specified recovery period has expired, and 
wherein the recovering processing unit command is per- 
manently excluded from the blended command when 5 
the recovering application fails to correctly recover dur- 
ing the specified recovery period. 

4. The method of claim 2, further comprising: 

analyzing the control commands generated by each of the 

processing units; and 

adjusting the computed blended command so that the con- 
trol commands generated by the processing units result 
in a substantially equalized control command. 

5. The method of claim 1, further comprising: 15 

comparing a blended command output with a computed 

command; 

disregarding the blended command output when devia- 
tions between the blended command output and the 
computed command exceed an established limit for a 20 
specified time; 

selecting a blended command output from a plurality of 
blended command outputs; 

comparing the selected blended command output with an 
actuator position feedback signal; and 25 

initiating an actuator shutdown command when deviations 
between the selected blended command output and the 
actuator position feedback signal exceed an established 
limit for a specified time. 

6. The method of claim 1, wherein the method is imple- 30 
mented in redundancy management of a recoverable digital 
control system. 

7. The method of claim 1, wherein the method is imple- 
mented in redundancy management of a computing platform. 

8. The method of claim 1, wherein the method is imple- 35 
mented in redundancy management of one or more actuator 
control units. 

9. The method of claim 1, wherein the method is imple- 

mented in redundancy management of one or more smart 
actuators. 40 

10. The method of claim 1, wherein the method provides 
coordinated redundant control of actuators by the actuator 
control units. 

11. The method of claim 1, wherein the method validates 
and responds to each valid command from the actuator con- 45 
trol units, and safely disengages an actuator when no valid 
command is received during an established limit for a speci- 
fied time. 

12. The method of claim 6, wherein redundancy manage- 
ment actions are distributed throughout the system so that 50 
multiple elements of the system can be in a recovery state at 
any one time. 

13. The method of claim 1, wherein redundancy manage- 
ment actions support multiple recoveries of redundant ele- 
ments from multiple monitors in real time. 
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14. The method of claim 1, further comprising duplicating 
state variable data stored in one or more memory devices in 
the computing units. 

15. The method of claim 6, further comprising restoring a 
duplicate set of state variable data when a soft fault is detected 
so that one or more processing units can resume processing 
using the duplicate set of state variable data. 

16. The method of claim 10, wherein at least one of the 
actuator control units detects attempts of other actuator con- 
trol units to control the same actuator, and stops attempting to 
control the same actuator as another actuator control unit, 
while attempting to establish control of an actuator that is not 
being driven by any other actuator control unit. 

17. The method of claim 10, wherein at least one of the 
actuator control units reports actuator status to a computing 
unit to allow fault isolation actions by the computing unit. 

18. A method for redundancy management comprising: 

providing a plurality of computing units each comprising: 

a plurality of redundant processing units for generating 
one or more redundant control commands; and 
one or more internal monitors for detecting one or more 
data errors in the control commands; 
wherein the processing units each include a recovery 
mechanism, the recovery mechanism comprising: 
a duplicate memory; 

an even frame memory, wherein the recovery mecha- 
nism is configured to duplicate state variables com- 
puted during even computational frames into the 
even frame memory; and 

an odd frame memory, wherein the recovery mecha- 
nism is configured to duplicate state variables com- 
puted during odd computational frames into the 
odd frame memory; 

wherein the even frame memory and the odd frame 
memory toggle back and forth duplicating state vari- 
ables into the duplicate memory for computational 
frames in which no fault is detected; 

providing a plurality of actuator control units having a pair 
of redundant computational lanes for analyzing control 
commands and providing feedback to the processing 
units; and 

initiating a selective and isolated recovery of one or more 
monitored applications in a processing unit of the plu- 
rality of processing units while one or more other appli- 
cations remain undisturbed in the processing unit when: 
one or more data errors are detected in the one or more 
monitored applications by one or more of the internal 
monitors; or 

one or more data errors are detected in the one or more 
monitored applications by one or more of the actuator 
control units; 

wherein the recovery mechanism creates and restores an 
error-free congruent set of state data for any of the one or 
more monitored applications selected for recovery. 





